Entity Matching for Intelligent Information Integration
نویسنده
چکیده
Due to the rapid development of information technologies, especially the network technologies, business activities have never been as integrated as they are now. Business decision making often requires gathering information from different sources. This dissertation focuses on the problem of entity matching, associating corresponding information elements within or across information systems. It is devoted to providing complete and accurate information for business decision making. Three challenges have been identified that may affect entity matching performance: feature selection for entity representative, matching techniques, and searching strategy. This dissertation first provides a theoretical foundation for entity matching by connecting entity matching to the similarity and categorization theories developed in the field of cognitive science. The theories provide guidance for tackling the three challenges identified. First, based on the feature contrast similarity model, we propose a case-study-based methodology that identifies key features that uniquely identify an entity. Second, we propose a record comparison technique and a multi-layer naïve Bayes model that correspond respectively to the deterministic and the probability response selection models defined in the categorization theory. Experiments show that both techniques are effective in linking deceptive criminal identities. However, the probabilistic matching technique is preferable because it uses a semi-supervised learning method, which requires less human intervention during training. Third, based on the prototype access assumption proposed in the categorization theory, we apply an adaptive detection algorithm to entity matching so
منابع مشابه
An Intelligent System’s Approach for Revitalization of Brown Fields using only Production Rate Data
State-of-the-art data analysis in production allows engineers to characterize reservoirs using production data. This saves companies large sums that should otherwise be spend on well testing and reservoir simulation and modeling. There are two shortcomings with today’s production data analysis: It needs bottom-hole or well-head pressure data in addition to data for rating reservoirs’ characteri...
متن کامل3D Classification of Urban Features Based on Integration of Structural and Spectral Information from UAV Imagery
Three-dimensional classification of urban features is one of the important tools for urban management and the basis of many analyzes in photogrammetry and remote sensing. Therefore, it is applied in many applications such as planning, urban management and disaster management. In this study, dense point clouds extracted from dense image matching is applied for classification in urban areas. Appl...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملSemantic Enrichment in Ontologies for Matching
Matching (or mapping) between heterogeneous ontologies becomes crucial for interoperability in distributed and intelligent environments. Although many efforts in ontology mapping have already been conducted, most of them rely heavily on the meaning of entity names rather than the semantics defined in ontologies. In order to deal with semantic heterogeneity, we enrich the semantics of ontologies...
متن کاملA Name-Matching Algorithm for Supporting Ontology Enrichment
Ontologies are widely used for capturing and organizing knowledge of a particular domain of interest. This knowledge is usually evolvable and therefore an ontology maintenance process is required. In the context of ontology maintenance we tackle the problem that arises when an instance/individual is written differently (grammatically, orthographically, lexicographically), while representing the...
متن کامل